随着培训深度学习模型的越来越大的负担,在许多新兴的深度学习算法中已广泛采用转移学习。诸如BERT之类的变压器模型是自然语言处理的主要参与者,并将转移学习用作事实上的标准培训方法。一些大数据公司发布了经过培训的预培训模型,这些模型已通过一些流行的数据集进行了培训,最终用户和研究人员使用自己的数据集对模型进行了微调。转移学习大大减少了培训模型的时间和精力。但是,这是以安全问题为代价的。在本文中,我们展示了一个新的观察结果,即预先训练的模型和微调模型在权重值上具有很高的相似性。另外,我们证明即使对于同一模型,也存在特定于供应商的计算模式。有了这些新发现,我们提出了一种新的模型提取攻击,该攻击揭示了模型架构和带有特定于供应商的计算模式的黑盒受害者模型使用的预培训模型,然后根据权重值相似性估算整个模型权重在微调模型和预训练模型之间。我们还表明,可以利用重量相似性来通过新颖的重量提取修剪来提高模型提取可行性。
translated by 谷歌翻译
Automatic medical image classification is a very important field where the use of AI has the potential to have a real social impact. However, there are still many challenges that act as obstacles to making practically effective solutions. One of those is the fact that most of the medical imaging datasets have a class imbalance problem. This leads to the fact that existing AI techniques, particularly neural network-based deep-learning methodologies, often perform poorly in such scenarios. Thus this makes this area an interesting and active research focus for researchers. In this study, we propose a novel loss function to train neural network models to mitigate this critical issue in this important field. Through rigorous experiments on three independently collected datasets of three different medical imaging domains, we empirically show that our proposed loss function consistently performs well with an improvement between 2%-10% macro f1 when compared to the baseline models. We hope that our work will precipitate new research toward a more generalized approach to medical image classification.
translated by 谷歌翻译
使用机器学习算法来预测复杂系统的行为正在蓬勃发展。但是,在包括燃烧在内的多物理问题中有效利用机器学习工具的关键是将它们与物理和计算机模型搭配使用。如果所有先验知识和物理约束都体现了这些工具的性能。换句话说,必须对科学方法进行调整,以使机器学习进入图片,并充分利用我们生成的大量数据,这要归功于数值计算的进步。本章回顾了一些开放的机会,用于应用燃烧系统的数据驱动的减少订单建模。提供了湍流燃烧数据,经验低维歧管(ELDM)识别,分类,回归和降低阶数模型中特征提取的示例。
translated by 谷歌翻译
本文展示了一种新的方法,可以使用语义分段特征提高面部识别姿势不变。拟议的SEG-DISTILD-ID网络共同学习识别和语义分割任务,然后将分割任务“蒸馏”(Mobilenet编码器)。在强调头置变化的公开数据集中,针对三个最先进的编码器进行了基准测试。实验评估表明,SEG-DISTILD-ID网络显示出显着的鲁棒性优势,相比之下,RESNET-101的测试准确性达到99.9%,VGG-19的96.1%,IntectionV3的vgg-19和96.3%。这是使用顶部编码器推理参数的大约十分之一来实现的。这些结果表明,蒸馏的语义分割特征可以有效地解决面部识别姿势不变。
translated by 谷歌翻译
该项目旨在开发和展示一个具有智力的地面机器人,该机器人能够为不同的低高度蔬菜农作物(称为农业应用程序机器人(AAR))进行半自治的农业运营。AAR是一种轻巧的太阳电动机器人,使用智能感知来进行植物及其特征进行检测和分类。该系统还具有用于自动杂草切割过程的机器人臂。机器人可以向诸如农作物,杂草和其他害虫等靶标的肥料喷涂,杀虫剂,除草剂和其他液体。此外,它为未来对高级任务(例如收益率,农作物和土壤健康监测)的研究提供了信息。我们介绍了机器人的设计和相关的实验,这些实验显示了现实世界环境中有希望的结果。
translated by 谷歌翻译
Recent work has shown the benefits of synthetic data for use in computer vision, with applications ranging from autonomous driving to face landmark detection and reconstruction. There are a number of benefits of using synthetic data from privacy preservation and bias elimination to quality and feasibility of annotation. Generating human-centered synthetic data is a particular challenge in terms of realism and domain-gap, though recent work has shown that effective machine learning models can be trained using synthetic face data alone. We show that this can be extended to include the full body by building on the pipeline of Wood et al. to generate synthetic images of humans in their entirety, with ground-truth annotations for computer vision applications. In this report we describe how we construct a parametric model of the face and body, including articulated hands; our rendering pipeline to generate realistic images of humans based on this body model; an approach for training DNNs to regress a dense set of landmarks covering the entire body; and a method for fitting our body model to dense landmarks predicted from multiple views.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译